Fix issues of attention.core_attention.softmax_offset is None #330

yueshen2016 · 2025-09-17T16:28:10Z

What does this PR do?

Type of change: ?
Bug fix
Overview: ?
Fix issues of attention.core_attention.softmax_offset is None for megatron import

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved robustness when importing/exporting models by safely handling missing or undefined softmax offsets.
- Prevents errors or crashes in certain attention configurations where an offset was absent.
- No changes to public APIs; behavior unchanged when a valid offset is provided.
- Enhances stability and reliability for affected import/export workflows.

copy-pr-bot · 2025-09-17T16:28:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-17T16:28:18Z

Walkthrough

Replaces a hasattr check with a getattr(..., None) is not None guard when reading attention.core_attention.softmax_offset, so the self.rules["softmax_offset"] rule is invoked only when the attribute exists and is not None. No public API changes.

Changes

Cohort / File(s)	Summary
Megatron importer: softmax_offset guard `modelopt/torch/export/plugins/megatron_importer.py`	Use `getattr(attention.core_attention, "softmax_offset", None) is not None` to decide invoking `self.rules["softmax_offset"]`, avoiding passing `None` and skipping the rule when the offset is missing or `None`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Importer as MegatronImporter
  participant Attn as attention.core_attention
  participant Rules as self.rules["softmax_offset"]

  Importer->>Attn: Read softmax_offset
  alt softmax_offset exists AND != None
    Importer->>Rules: Invoke with softmax_offset
    Rules-->>Importer: Return transformed offset
  else softmax_offset missing or None
    Note over Importer: Skip invoking rule
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit twitched its ears with care,
“Call only when there’s something there!”
No None shall hop into the rule,
Guards in place—so calm, so cool.
With careful checks, the fields align;
My burrow’s code now runs just fine. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly identifies the primary change: a bugfix addressing cases where attention.core_attention.softmax_offset is None in the Megatron importer. It directly matches the code change described in modelopt/torch/export/plugins/megatron_importer.py and is neither misleading nor overly generic.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yue/fix-megatron-importer

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da6dc1b and 8e8e77d.

📒 Files selected for processing (1)

modelopt/torch/export/plugins/megatron_importer.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

modelopt/torch/export/plugins/megatron_importer.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: code-quality
GitHub Check: build-docs

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ChenhanYu · 2025-09-17T16:31:11Z

modelopt/torch/export/plugins/megatron_importer.py

                        self.rules["linear_qkv"](attention.linear_qkv, layer_id)
                        self.rules["linear_proj"](attention.linear_proj, layer_id)
-                        if hasattr(attention.core_attention, "softmax_offset"):
+                        if (


if getattr(attention.core_attention, "softmax_offset", None) is not None:

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

modelopt/torch/export/plugins/megatron_importer.py (1)

515-521: Harden the guard to avoid AttributeError when core_attention is absent.

Accessing attention.core_attention inside hasattr(...) can still raise if core_attention itself is missing or None on some attention variants. Use getattr chaining and reuse the computed offset.

-                        if (
-                            hasattr(attention.core_attention, "softmax_offset")
-                            and attention.core_attention.softmax_offset is not None
-                        ):
-                            self.rules["softmax_offset"](
-                                attention.core_attention.softmax_offset, layer_id
-                            )
+                        core_attn = getattr(attention, "core_attention", None)
+                        offset = getattr(core_attn, "softmax_offset", None) if core_attn is not None else None
+                        if offset is not None:
+                            self.rules["softmax_offset"](offset, layer_id)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 682bf6d and da6dc1b.

📒 Files selected for processing (1)

modelopt/torch/export/plugins/megatron_importer.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: code-quality
GitHub Check: build-docs

…atron importer Signed-off-by: Yue <[email protected]>

yueshen2016 · 2025-09-17T18:49:15Z

/ok to test 8e8e77d

codecov · 2025-09-17T22:45:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (682bf6d) to head (8e8e77d).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
- Coverage   73.82%   73.82%   -0.01%     
==========================================
  Files         172      172              
  Lines       17438    17438              
==========================================
- Hits        12874    12873       -1     
- Misses       4564     4565       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Yue <[email protected]> Signed-off-by: Ye Yu <[email protected]>

yueshen2016 requested a review from a team as a code owner September 17, 2025 16:28

yueshen2016 requested a review from cjluo-nv September 17, 2025 16:28

yueshen2016 requested review from ChenhanYu and yeyu-nvidia September 17, 2025 16:28

yueshen2016 changed the title ~~Fix issues of attention.core_attention.softmax_offset is None for meg…~~ Fix issues of attention.core_attention.softmax_offset is None Sep 17, 2025

ChenhanYu reviewed Sep 17, 2025

View reviewed changes

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

Fix issues of attention.core_attention.softmax_offset is None for meg…

8e8e77d

…atron importer Signed-off-by: Yue <[email protected]>

yueshen2016 force-pushed the yue/fix-megatron-importer branch from da6dc1b to 8e8e77d Compare September 17, 2025 16:33

yueshen2016 requested a review from ChenhanYu September 17, 2025 16:33

ChenhanYu approved these changes Sep 17, 2025

View reviewed changes

yueshen2016 enabled auto-merge (squash) September 17, 2025 16:41

yueshen2016 merged commit d94fc1b into main Sep 17, 2025
21 checks passed

yueshen2016 deleted the yue/fix-megatron-importer branch September 17, 2025 20:25

yeyu-nvidia pushed a commit that referenced this pull request Sep 18, 2025

Fix issues of attention.core_attention.softmax_offset is None (#330)

a6a0805

Signed-off-by: Yue <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix issues of attention.core_attention.softmax_offset is None #330

Fix issues of attention.core_attention.softmax_offset is None #330

Uh oh!

yueshen2016 commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 17, 2025

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

ChenhanYu Sep 17, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

yueshen2016 commented Sep 17, 2025

Uh oh!

Uh oh!

codecov bot commented Sep 17, 2025

Uh oh!

Uh oh!

Fix issues of attention.core_attention.softmax_offset is None #330

Fix issues of attention.core_attention.softmax_offset is None #330

Uh oh!

Conversation

yueshen2016 commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 17, 2025

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

ChenhanYu Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yueshen2016 commented Sep 17, 2025

Uh oh!

Uh oh!

codecov bot commented Sep 17, 2025

Codecov Report

Uh oh!

Uh oh!

yueshen2016 commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading